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ABSTRACT 

Our knowledge of prokaryotic defense systems has 
vastly expanded as the result of comparative 
genomic analysis, followed by experimental valid- 
ation. This expansion is both quantitative, including 
the discovery of diverse new examples of known 
types of defense systems, such as restriction- 
modification or toxin-antitoxin systems, and qualita- 
tive, including the discovery of fundamentally new 
defense mechanisms, such as the CRISPR-Cas 
immunity system. Large-scale statistical analysis 
reveals that the distribution of different defense 
systems in bacterial and archaeal taxa is non-uniform, 
with four groups of organisms distinguishable with 
respect to the overall abundance and the balance 
between specific types of defense systems. The 
genes encoding defense system components in bac- 
terial and archaea typically cluster in defense islands. 
In addition to genes encoding known defense 
systems, these islands contain numerous unchar- 
acterized genes, which are candidates for new types 
of defense systems. The tight association of the 
genes encoding immunity systems and dormancy- 
or cell death-inducing defense systems in prokaryotic 
genomes suggests that these two major types of 
defense are functionally coupled, providing for effect- 
ive protection at the population level. 

INTRODUCTION 

Arms race between viruses and their hosts is arguably the 
most powerful and relentless driving force in evolution 
(1-3). As a result, numerous extremely diverse and elab- 
orate antiviral defense systems have evolved and occupy a 
substantial part of the genome especially in free-living 
archaea and bacteria (4,5). Although some of these 
systems have been known for many years and have been 
thoroughly characterized, recent advances in comparative 



genomics and experimental study of virus-host interaction 
have revealed many new antiviral defense mechanisms 
(5-8). 

The defense systems of prokaryotes can be classified 
into two broad groups that differ in their modes of 
action. The first group includes those defense systems 
that function on the self-non-self discrimination principle, 
with DNA usually being the target of the discriminatory 
recognition; these defense mechanisms can be viewed as 
prokaryotic immunity. At least three types of defense 
systems and their derivatives belong to this group. The 
best characterized of these are the extremely numerous 
and diverse restriction-modification (R-M) system that 
use methylation to label the 'self genomic DNA and rec- 
ognize and cleave any unmodified 'non-self DNA (9-11). 
Another defense system in this group is DNA phospho- 
rothioation (known as the DND system), which labels 
DNA by phosphothiolation and destroys unmodified 
DNA (8,12,13). The R-M and DND systems represent 
the prokaryotic version of innate immunity. 

Unlike R-M and DND systems, which attack non-self 
invaders indiscriminately, the CRISPR (Clustered 
Regularly Interspaced Short Pahndromic Repeats)-Cas 
(CRlSPR-associated genes) systems is able to memorize 
the encounters with infectious agent and attack it specif- 
ically afterwards (14-18). Thus, CRISPR-Cas is often 
viewed as a prokaryotic adaptive immunity system. 

The second group of defense systems is generally based 
on programmed cell death or dormancy induced by infec- 
tion. Numerous and diverse toxin-antitoxin (TA) systems 
belong in this category. Depending on the nature of toxins 
and antitoxins, the TA systems are currently classified into 
three types: type I with antisense RNA as antitoxin and a 
protein, usually a small membrane holin-like protein as a 
toxin; type II, in which both toxin and antitoxin are 
proteins, and type III, in which with the RNA antitoxin 
directly inactivates the protein toxin (7,19-28). Two add- 
itional types of TA systems (IV and V) have been recently 
proposed based on distinct mechanisms of action of the 
respective antitoxins (29,30). In addition to the TA 
systems, abortive infection (ABI) or phage exclusion 
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systems also often use the mechanism of cell death or 
dormancy. These systems have not been so far classified 
in detail, but some of them fit well into the TA systems 
description (31). The vast majority of toxins in both TA 
systems and ABI systems interfere with the translation 
process, mostly via mRNA or tRNA cleavage. 

Numerous recent comparative genomic studies not only 
revealed the high abundance of the known defense system 
and predicted new ones whose molecular mechanisms of 
action remain to be characterized but also highhghted 
several distinct properties of these systems. 

• The genes encoding different defense systems often 
cluster in genomic islands of larger than an operon 
size. 

• The immunity systems are often encoded within the 
same genomic loci with systems that cause cell death 
or dormancy, and, at least in some cases, the two 
classes of defense systems functionally cooperate. 

• Different families of toxins and antitoxins often re- 
combine to form (almost) all possible TA pairs. 

• Defense systems or their components sometimes 
change their mode of actions. Thus, R-M systems 
can switch to the functional mode characteristic of 
TA systems, whereas individual components of TA 
systems can act solo as ABI systems. 

The purpose of this article is to examine these recent 
observations in some detail and to focus on several 
recently predicted and still poorly characterized defense 
systems of bacteria and archaea. The functions and com- 
parative genomics of well-characterized prokaryotic 
defense systems such as R-M, TA and CRISPR-Cas 
have been discussed in detail in multiple reviews; there- 
fore, here, we only include brief summaries of the pertin- 
ent features of these systems. 



DISTRIBUTION OF DEFENSE SYSTEMS IN 
ARCHAEA AND BACTERIA AND FOUR 
DISTINCT DEFENSE STRATEGIES INFERRED 
FROM GENOME ANALYSIS 

The fraction of bacterial and archaeal genomes allotted to 
defense systems varies broadly, from virtual absence to 
~10% (Figure lA). These distributions reflect the low 
bound for each type of defense systems because many 
more instances undoubtedly remain to be discovered as 
discussed in the rest of this article. The overall abundance 
of defense systems shows nearly perfect hnear scaHng with 
genome size (5). The number of TA genes generally in- 
creases faster than linearly (as a power of ~1.3 of the 
total number of genes), ABI system genes take an approxi- 
mately constant fraction of the genome (~1 per 1000 
genes), and R-M genes scale subhnearly with the genome 
size (power of -0.75) (Figure IB). The CRISPR-Cas 
system abundance is statistically the same in large and 
small genomes. The differential scahng with genome size 
implies that it is most appropriate to analyse the abun- 
dance of defense systems genes relative to the expected 
abundance, given the host genome size. 



The immediate outcome of the analysis of the distribu- 
tion of defense genes is their pronounced enrichment in 
archaea compared with bacteria and in thermophiles 
(especially hyperthermophiles) compared with mesophiles 
and psychrophiles (5). The two trends, the dependency on 
taxonomy and temperature preference, seem to be inde- 
pendent of each other. A deeper analysis of the distribu- 
tion of the relative abundances of genes belonging to 
different defense systems reveals four distinct clusters of 
organisms in the principal component-like space 
(Figure 2) as indicated by gap function analysis (32). 
This observation imphes the existence of four distinct 
'defense strategies': (i) all defense systems are under-rep- 
resented relative to their expected abundance: in the re- 
spective organisms, defense is either abandoned altogether 
or reduced to bare-bones minimum; (ii) the total number 
of genes dedicated to defense is close to the expected value; 
prevalence of R-M and ABI over TA and CRISPR; (iii) 
the total number of genes dedicated to defense is close to 
the expected value; prevalence of TA and CRISPR over 
R-M and ABI; and (iv) all defense systems are over-rep- 
resented, i.e. a greater than average fraction of the genome 
is dedicated to antivirus defense (Figure 2A). 

An overwhelming majority of bacterial thermophiles, 
along with the archaea, regardless of the optimal growth 
temperature, follow strategies (iii) or (iv), including a 
general over-representation of defense system genes 
(Figure 2B and C). Bacteria are widely spread across the 
entire parameter space, with most of the large bacterial 
groups showing a range of defense strategies among the 
representative genomes (Figure 2C). 

Certainly, one has to keep in mind that the aforemen- 
tioned partitioning of the archaeal and bacterial defense 
strategies is conditioned on our ability to identify defense 
systems by genome analysis. In particular, assignment of 
an organism to the first strategy (no or little defense) could 
be somewhat naive in the sense that some of these organ- 
isms might use completely uncharacterized novel defense 
systems. This concern is minor when it comes to parasitic 
or symbiotic organisms with very small genomes to which 
this strategy (or perhaps more precisely, lack of defense 
strategy) trivially applies. However, extreme paucity of 
identifiable defense systems has been noted also for 
some bacteria with large genomes, e.g. Paenibacillus sp, 
with a genome of more than seven megabases (5). In 
these cases, the potential unknowns loom large, and it is 
a question of major interest whether the lifestyle of these 
organisms renders defense systems superfluous or favours 
novel defense mechanisms. 



DEFENSE ISLANDS 

Many cases of clustering of defense genes on the chromo- 
somes have been described (27,33,34) as well as involve- 
ment of transposable elements in horizontal transfer of 
defense genes (35-37), indicating high mobility and pref- 
erential attachment of these systems. Thus, unlike other 
functional groups of bacterial and archaeal genes (such as 
sugar metabolism, energy metabolism, etc.), defense 
systems and mobilome-related genes, such as prophages, 
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Pelodictyon phaeoclalhratiforme. The detection of 
extreme abundance of defense systems in taxonomically 
scattered bacteria implies that such over-representation 
is not hneage-specific but is perhaps dictated by the 
ecology of the respective organisms that might be 
subject to unusual massive assault by invasive agents (5). 

This simple operational definition of defense islands has 
proved extremely useful for the prediction of new defense 
systems (5) and understanding the cooperation between 
them (see later in the text). Figure 3 shows several 
examples of defense islands that are specifically enriched 
for genes from different defense systems and include 
several still experimentally uncharacterized genes that 
are implicated in antivirus defense. 
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Figure 1. The major types of defense systems in bacterial and archaeal 
genomes. (A) Distribution (probability density function) of the genome 
fraction occupied by defense systems in bacteria and arcliaea. 
(B) Scaling of the number of genes in defense systems with the total 
number of genes. A data set of 572 genomes (the largest genome in a 
genus with addition of E. coli K12 and B. suhtiUs subsp. suhtiUs) was 
selected to represent 1516 genomes that were completely sequenced and 
available through the NCBI Genome database as of February 2012. 



form clusters the size of which by far exceeds the size of 
typical operons and that are unhkely to appear by chance. 
Statistically significant over-clustering of different defense 
systems has been demonstrated (5). Briefly, many defense 
operons tend to be in closer physical proximity to each 
other on the chromosome, compared with the random 
expectation [see (5) for details]. This finding suggests the 
possibihty of synergistic interactions between different 
types of defense systems. Although currently there is no 
unequivocal definition of the defense islands and no clear 
understanding of the mechanism(s) of their formation, 
a simple operational definition has been proposed. 
A defense island is defined as a string of continuous 
genes, at least one of which belongs to a known defense 
gene families, which are flanked by house-keeping genes; 
such islands are significantly enriched by defense and 
mobilome-related genes, compared with analogous 
blocks formed by other genomic systems (5). The percent- 
age of genes found in defense islands varies from 0 to 30% 
across the current collection of prokaryotic genomes 
(Figure lA) (5). The greatest fraction of the genome 
dedicated to antiviral defense was detected in the cyano- 
bacterium Microcystis aeruginosa, the proteobacterium 
Bartonella tribocorum and the bacteroidetes bacterium 



DEFENSE MECHANISMS IN BACTERIA 
AND ARCHAEA 

Innate immunity: DNA modification systems 

The R-M systems are probably the best studied phage 
defense mechanism in bacteria owing to the extensive ap- 
plication of restriction endonucleases in molecular biology 
(9-1 1). Because of this practical importance, as well as the 
extreme diversity in the genomic organization and protein 
domain architecture of the R-M systems, detailed rules for 
restriction enzyme classification and nomenclature have 
been developed (38). This classification divides the R-M 
systems into four major types (I-IV), on the basis of 
subunit composition, ATP(GTP) requirement and 
cleavage mechanism (39^1). All the R-M systems 
function on the same principle of self-non-self discrimin- 
ation, with one enzyme, a methyltransferase (MTase), 
modifying the self DNA and the other one, restriction 
endonuclease (REase), cleaving non-methylated foreign 
DNA (38,42). Type 11 R-M systems are the simplest and 
by far the most common and are mostly used for experi- 
mental applications owing to the fact that these enzymes 
cleave the target DNA at highly specific sites. The Type II 
R-M systems have been further classified into several 
subtypes, primarily on the basis of cleavage specificity 
(41). The Type II systems consist solely of the MTase- 
REase pair that is typically encoded within the same 
operon, although some cases of apparent disjointed local- 
ization of the two genes have been reported (43). The most 
complex ATP-dependent Type I R-M systems encompass 
three genes, which encode the R (restriction), M (modifi- 
cation) and S (specificity) subunits of the R-MA complex; 
the R subunit also contains a distinct ATPase domain that 
belongs to the helicase Superfamily II (42,44,45). Type III 
R-M system resemble Type II systems in that they consist 
of only R and M subunit but, on the other hand, are 
similar to Type I systems in that the R subunit also 
contains the helicase domain and the reaction is 
ATP-dependent (46,47). Type IV R-M systems are 
distinct two-subunit complex that consist of a 
AAA + family GTPase and an endonuclease, and cleave 
the target DNA non-specifically (45,48). 

Many genomic loci that encompass R-M systems of all 
four major types also include variable groups of additional 
genes that appear to be co-expressed with the genes for 
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Figure 2. Distribution of known and predicted defense systems in archaeal and bacterial genomes. (A) The four 'defense strategies'. Here, 1-4 refers 
to the four strategies discussed in the text. The axes show logs of the ratios of the numbers of genes belonging to a given type of defense systems to 
the number expected from the scaling shown in Figure IB. The horizontal axis is the sum of the logs for all four types and the vertical axis is 
(TA + CRISPR) - (R-M + ABI). (B) Defense strategies used by bacterial and archaeal thermophiles and mesophiles. BT, AT, BM and AM stand for 
bacterial thermophiles, archaeal thermophiles, bacterial mesophiles and archaeal mesophiles, respectively. The axes show logs of the ratios of the 
numbers of genes belonging to a given type of defense systems to the number expected from the scaling shown in Figure IB. The horizontal axis is 
the sum of the logs for all four types and the vertical axis is (TA + CRISPR) — (R-M + ABI). (C) Distribution of the defense strategies among major 
prokaryotic taxa. Here, 1^ refers to the four strategies discussed in the text. The number of analysed genomes for each taxon is indicated inside the 
respective bar. The expected abundance of genes belonging to the defense systems of each type in a given genome was calculated from the genome 
size using the observed scaling relationships (Figure IB). Logarithms of the ratios of the observed and expected frequencies of defense system genes 
in genomes were analysed using Principal Component Analysis; then the data were projected into the space of two orthogonal axes with integer 
coefficients closest to the first principal components. 
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R-M system subunits (5) (Figure 3). Although most of 
these genes have not been experimentally characterized, 
one such case has been studied in considerable detail 
and presents a remarkable example of the interplay 
between different defense mechanisms. The Escherichia 
coli anticodon nuclease (ACNase) prrC co-localizes with 
three genes for R-M type Ic system prri and contributes to 
the T4 phage exclusion mechanism (49-51). This genomic 
association that is conserved in diverse bacteria implies 
also a functional connection, and at least one case has 
been studied in detail. The PrrC nuclease, normally 
inactive, can be allosterically activated either by unmodi- 
fied DNA or by the smaU anti-restriction peptide encoded 
by the T4-hke enterobacteriophages. The activated PrrC 
ACNase cleaves the anticodon of tRNA^^** in a 
GTP-dependent manner; the GTP hydrolysis is catalysed 
by the N-terminal ABC NTPase domain of PrrC. The 
cleavage of tRNA'^^^ inhibits the host translation and as 
a consequence the reproduction of the T4 phage. The 
RloC enzyme that is homologous to PrrC does not seem 
to be linked to R-M systems, has similar biochemical 
properties and is activated under genotoxic stress 
(52,53). Recent analysis has shown that the ACNase 
domain of both proteins belongs to the HEPN superfam- 
ily that is merging as a major group of ribonucleases that 
are involved in various forms of defense and stress 
response (54,55). 

Site-specific DNA backbone S-modification and 
cleavage of unmodified DNA and the dndABCDE genes 
(after DNA degradation phenotype; alternatively, these 
genes are designated dpt, i.e. DNA phosphothiolation) 
involved in this system have been first discovered in 
Streptomyces lividans 1326. Five additional genes 
{clndFGHI) that are strongly hnked to this system have 
been found by analysis of the genomic neighbourhoods 
(12,13). Recently, the genes required for modification 
{dndABCDE) and restriction (dndFGH) have been 
identified in the related system from Salmonella enterica 
serovar Cerro 87 (8). The structures and biochemical 
activities of the DndA and DndC proteins that are 
directly involved in S-modification are relatively 
well-understood (56,57), and the functions of the other 
genes associated with this system are less clear. 
Moreover, the neighbourhood around the genes that 
comprise this system is highly flexible, including cysteine 
desulfurase dndA, which often is not hnked to the other 
dnd genes (8). Here, we present results of additional 
sequence and gene context analysis for these genes that 
show a strong link of several components of the DND 
systems with ABI and TA systems (Supplementary Table 
SI). For instance, DndB, the potential negative regulator 
of restriction (13,58), contains an N-terminal region that 
belongs to the ABI protein family AbiUl/AIPR/ 
COG 1479, which encompasses a ParB superfamily 
nuclease domain often fused to other nuclease domains 
from different families and hnked to R-M systems 
(55,59). In DndB, the ParB-hke domain is additionally 
fused to a HEPN domain. A distinct HEPN domain 
from a different subfamily (DUF4145) is fused to DndF 
NTPase. Domains of the latter subfamily are often fused 
to REase components of Type I R-M systems (55). 



The third DNA modification system, which is involved 
in Phage Growth Limitation (Pgl) system, is so far poorly 
characterized experimentally. The Pgl system is centred 
around the PglZ protein family in which the only recog- 
nizable domain belongs to the alkahne phosphatase super- 
family (pfam08665) (60). The scarce experimental evidence 
indicates that PglZ confers protection against the temper- 
ate bacteriophage pliiC31 in Streptomyces coelicolor A3(2) 
(61,62). This system also includes the P-loop ATPase 
domain-containing protein PglY, the methylase PglW 
and the serine-threonine kinase PglX (the latter two 
proteins are encoded in a different locus in 5. coelicolor 
genome). The bacteria that possess the Pgl system support 
a phage burst on initial infection, but subsequent phage 
growth cycles are severely restricted (62). Although the 
molecular mechanism of the Pgl system has not been ex- 
perimentally elucidated, it has been hypothesized that it 
methylates the DNA of the phage progeny rather than the 
host DNA so that on re-infection, the surviving ceUs in the 
same Streptomyces colony could activate the system and 
prevent phage growth (61,62). Thus, the Pgl system might 
function via a reverse R-M mechanism combining the 
self-non-self discrimination and virus-induced ceU death 
modes of antivirus defense in a novel defense strategy. The 
recent comparative analysis of the neighbourhoods of the 
pglZ gene revealed a substantial complexity of genetic or- 
ganization of this system that could be possibly compared 
only with the CRISPR-Cas system (see later in the text) 
(5). Supplementary Table S2 hsts the gene families that are 
associated with pglZ gene. One of these families is 
COG1479 (or DUF262 or DGQHR domain) that has 
been previously identified within the Type I R-M system 
locus in Campylobacter jejuni (63). The core domain of the 
COG 1479 family belongs to the ParB-like superfamily and 
is often fused to other nucleases such HNH-type nuclease 
domain, PD-(D/E)xK-hke nuclease and HEPN domain, 
suggesting that it might be another case of a programmed 
cell-death system associated with various DNA modifica- 
tion systems (5). Based on the presence of the pglZ gene, 
this system is found in 174 of 1516 completely sequenced 
genomes that represent most of the major bacterial 
hneages and several methanogenic and halophilic 
archaea. The remarkable complexity of the Pgl system 
seems to reflect a still poorly understood elaborate 
molecular mechanism of self-non-self discrimination and 
fine-tuned regulation. 

Adaptive immunity: the CRISPR-Cas system 

The CRISPR-Cas system uses a unique defense mechan- 
ism that involves incorporation of virus DNA fragments 
into CRISPR repeat arrays and subsequent utilization of 
transcripts of these inserts (spacers) as guide RNAs to 
cleave the cognate virus genome (34,64-67). Thus, the 
CRISPR-Cas system represents bona fide adaptive 
immunity that until recently has not been discovered in 
prokaryotes and, moreover, is the most clear-cut known 
case of Lamarckian inheritance (68). The role in antiviral 
defense that initially was predicted for this system on the 
basis of the detection of spacers identical to fragments of 
virus and plasmid genomes and comparative analysis of 
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Figure 3. Examples of defense islands in archaeal and bacterial genomes. The genes are shown by block arrows with the size roughly proportional to 
the size of the corresponding gene. The genomic position of each region is indicated given in parentheses after the species name in the form of the 
range of genes denoted using the systematic names for the respective species. Colour coding is the following: pink are components of TA systems, 
read, components of CRISPR-Cas systems; dark blue, Pgl system; light blue, regulatory components; green, R-M systems; yellow, ABI system; 
orange, pAgo; brown, components that are spredicted to be involved in defense; grey, unknown protein. The protein family or domains names are 
provided above the respective arrows; some of these families were recently introduced and described in the course of comparative genomic analysis of 
defense islands (5); COG or Pfam families are indicated in parentheses. Pgl, Phage Growth Limitation; HTH, helix-turn-helix; RHH, 
ribbon-helix-helix; GIY-YIG, conserved motif in a nuclease family. 
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Cas protein sequences has been successfully confirmed ex- 
perimentally (69). Within the few years since this key 
breakthrough, the CRISPR research evolved into a 
distinct, highly dynamic field of microbiology with consid- 
erable biotechnology potential (70-73). The recent 
advances in the study of CRISPR-Cas systems are 
covered in many reviews (15,74-76); therefore, here we 
present only a brief outline of the functions and compara- 
tive genomics of prokaryotic adaptive immunity and 
discuss the hkely scenarios for the evolution of the differ- 
ent types of CRISPR-Cas. 

The CRISPR-Cas systems are classified into three 
distinct types (I, II and III) (18) and several yet unclassi- 
fied minor variants (77). This classification was developed 
through a combination of comparisons of the sequences of 
the Cas proteins, cas gene repertoires and genomic organ- 
ization of the CRISPR-Cas loci. For each type and 
subtype, a specific signature gene has been identified 
allowing easy classification of the highly variable 
CRISPR-Cas loci in the course of genome analysis (18). 
The mechanism of CRISPR-Cas is usually divided into 
three stages: (i) adaptation, when new spacers homologous 
to protospacer sequences in viral genomes or other ahen 
DNA molecules are integrated into the CRISPR repeat 
cassettes; (h) expression and processing of pre-crRNA 
into short guide crRNAs; and (iii) interference, when the 
alien DNA or RNA is targeted by a complex containing a 
CRISPR RNA (crRNA) guide and a set of Cas proteins 
[for review, see (15)]. Below, we focus on the basic building 
blocks of the distinct types of CRISPR-Cas systems and 
summarize the current considerations on the origin and 
evolution of this system. 

Most of the Cas protein sequences evolve under relaxed 
purifying selection (78) and/or undergo accelerated evolu- 
tion resulting from the virus-host arms race [e.g. (79)]. 
Consequently, most of these sequences are weakly 
conserved in evolution so that conventional sequence 
comparison partitions the Cas proteins into >100 
families (18). However, advanced sequence analysis 
combined with structural comparison identifies conserved 
domains between Cas protein families that were originally 
considered unrelated and thus enables the identification of 
the major building blocks that are shared by different 
CRISPR-Cas types (Figure 4A) (18,34,64,77). The two 
proteins that are present in the great majority of the 
CRISPR-Cas systems are Casl and Cas2 that together 
are required and sufficient for spacer integration (the 
adaptation phase of the CRISPR-Cas response) (80). 
The only CRISPR-Cas loci that lack Casl and Cas2 
genes are some Type III systems that co-exist with Type 
I systems within the same genome and apparently borrow 
Casl and Cas2 proteins from the latter (18). Although 
both Casl and Cas2 are involved in adaptation, Casl 
endonuclease that adopts a unique a-helical fold (81) 
appears to possess all the required enzymatic activities, 
whereas Cas2 might perform a distinct function that is 
not mechanistically related to spacer acquisition (see dis- 
cussion later in the text). 

With the exception of Casl, most of the common Cas 
proteins contain various versions of the RNA Recognition 
Motif (RRM) domain, a widespread RNA-binding 



domain that in particular comprises the core of diverse 
DNA and RNA polymerases (where it is denoted the 
Palm domain). Among the Cas proteins, different 
variants of the RRM domain are present in Cas2 (a 
toxin-like ribonuclease), Cas 10 (the so-called CRISPR 
polymerase, a protein that is homologous to polymerases 
and cyclases but whose actual biochemical activity 
remains unknown) and in the largest group of Cas 
proteins known as the RAMP (Repeat-Associated 
Mysterious Proteins) superfamily (Figure 4B). In particu- 
lar, all CRISPR-Cas systems of Type I and most of the 
systems of Type III include a dedicated ribonuclease for 
the pre-crRNA processing that typically belongs to the 
Cas6 family of the RAMPs (82,83). In some cases, e.g. 
in CRISPR-Cas systems of Type I-C, the function of 
Cas6 is displaced by a catalytically active RAMP of the 
Cas5 family (84). In contrast. Type II CRISPR-Cas 
systems use an unrelated mechanism of pre-crRNA 
cleavage. This version of pre-crRNA processing requires 
the involvement of the double-stranded RNA-specific 
RNase III, a specialized trans-encoded small RNA, 
which is complementary to a single CRISPR repeat, and 
still unidentified domains of the Cas9 protein 
(18,69,85,86). 

In Type I-E and I-F CRISPR-Cas systems, the 
endoribonuclease that catalyses the processing of the 
pre-crRNA is a subunit of a multisubunit (or 
multidoniain) complex known as CASCADE 
(CRISPR-associated complex for antiviral defense) (87). 
The mature crRNA remains associated with the 
CASCADE complex that scans the target DNA for a 
match, and once one is found, recruits the Cas3 protein 
that cleaves the target via its HD endonuclease domain 
(88). In Type III systems (at least the model system from 
the archaeon Pyrococcus furiosus), the Cas6 
endoribonuclease does not belong to the CASCADE 
complex that is apparently not directly involved in the 
processing but instead binds the mature crRNA (89,90). 
This distinction apart, the architectures of the CASCADE 
complexes in Type I and Type III CRISPR-Cas are similar 
and include a large subunit, a small subunit and a pair of 
RAMPs that belong to the Cas5 and Cas7 famihes 
(84,87,90-92) (Figure 4A). Despite the high level of 
sequence divergence and structural rearrangements that 
is typical of many Cas proteins, there appears to be a 
direct homologous relationships between the respective 
subunits of the Type I and Type III CASCADES (77). A 
notable difference is that Type I CRISPR-Cas 
encompasses a single Cas7 protein that is present in 
several copies in the CASCADE, whereas in Type III 
systems, there are several paralogous Cas7-hke proteins. 
In Type II CRISPR-Cas, a single large multidomain 
protein, Cas9, is responsible for all the functions that in 
Type I and Type III systems are performed by the 
CASCADE and the Cas3 protein (93). 

The target DNA cleavage in Type I (88) and most likely 
in Type III systems (77) is catalysed by homologous HD 
family nucleases. In many Type III systems, the HD 
domain is fused to the caslO gene, the large subunit of 
the CASCADE-hke complex, whereas in Type I systems, 
the most common protein architecture is Cas3 in which 
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the HD domain is fused to a distinct helicase domain that 
is essential for the interference stage (88,94). Type 11 
systems use an unrelated mechanism that involves two 
distinct nuclease domains, HNH and RuvC-like, both 
contained within the Cas9 protein (95). This mechanism 
involves a unique two-RNA structure that consists of the 
mature crRNA base-paired, which is base-paired with the 
trans-encoded small RNA and directs Cas9 to the cognate 
DNA sequence where this protein introduces double- 



stranded breaks. During this process, the HNH nuclease 
domain of Cas9 cleaves the strand of the target DNA that 
is complementary to the crRNA, whereas the RuvC 
domain cleaves the second strand (95). 

The Casl endonuclease, the CASCADE subunits and 
the CasS helicase-nuclease are essential for the immune 
function of the respective CRISPR-Cas systems. In 
addition, the CRISPR-Cas loci encompass many other 
genes that encode proteins whose mechanistic role in 



4368 Nucleic Acids Research, 2013, Vol. 41, No. 8 



adaptive immunity remains unclear but that belong to 
protein families implicated in other defense systems. 
These CRISPR-associated gene products include the 
ribonuclease Cas2, the RecB-like nuclease Cas4 and 
numerous representatives of the COG1517 superfamily 
of helix-turn-helix and putative ligand-binding domain 
containing proteins (34,77). Most of these proteins, in par- 
ticular Cas2, contain domains that are predicted to be 
nucleases and toxins, suggesting a secondary role as 
associated immunity components [see details later in the 
text and (55)]. Finally, the functions of several Cas 
proteins remain completely obscure. 

Taken together, the results of comparative sequence 
analysis, structural studies and experimental data suggest 
that despite the remarkable complexity and diversity, all 
CRISPR-Cas systems use the same architectural and func- 
tional principles and, given the conservation of the prin- 
cipal building blocks, share a common ancestry 
(Figure 4A). It is notable, however, that some of the es- 
sential components of the CRISPR-Cas systems can be 
replaced either by homologous proteins, such as the sub- 
stitution of Cas5 for Cas6 in Type I-C CASCADE 
complexes, or by non-homologous but functionally analo- 
gous proteins, such as the substitution of the HNH and 
RuvC-hke domains of Cas9 for the HD nuclease. 

Under the recently proposed parsimonious evolutionary 
scenario, only a few evolutionary events would suffice to 
explain the emergence of CRISPR-Cas system types and 
subtypes (55). Furthermore, comparison of the recently 
solved structures of all major components of the 
CASCADE complex suggests that the RAMPs and the 
small subunits might have evolved from the ancestral 
large subunit resembling the Cas 10 protein that contains 
two RRM domains and an alpha-helical domain 
resembhng the small subunit (96,97). The CaslO protein 
(the large subunit of Type III CRISPR-Cas systems) could 
have evolved from an ancestor RRM (Palm) domain- 
containing polymerase or cyclase and, combined with 
the HD domain, might have originally functioned as a 
CRISPR-independent defense (innate immunity) system 
(55). The Casl-Cas2 module originally might have func- 
tioned independently as a TA system (see discussion later 
in the text). Joining this module with the hypothetical an- 
cestral CASCADE-HD system might have led to the 
emergence of the adaptation stage and accordingly the 
transformation of an innate immunity mechanism into 
one for adaptive immunity. 

The ancestral CaslO-like protein and the entire ances- 
tral, subtype Ill-like CRISPR-Cas system most likely 
evolved in hyperthermophilic archaea and was subse- 
quently horizontally transferred to bacteria. Indeed, in 
archaeal hypertheiTnophiles, this variant of the CRISPR- 
Cas system is (nearly) universal in these organisms, in a 
sharp contrast to the presence of any form of CRISPR- 
Cas in <50% of archaeal and bacterial mesophiles 
(18,77,98). In accord with this scenario, a recent mathem- 
atical modelHng study has shown that the benefits of 
adaptive immunity are substantially greater under the con- 
ditions of limited virus mutability that seems to be char- 
acteristic of hyperthermophilic habitats (99). 



Putative defense systems associated with prokaryotic 
Argonaute homologs 

Another putative defense system that remains to be experi- 
mentally characterized centres around prokaryotic homo- 
logues of the sheer nuclease argonaute (pAgo), the central 
component of the eukaryotic RNAi system (100). In all, 
189 pAgo sequences have been identified in complete or 
draft genomes that represent most of the major branches 
of archaea and bacteria. For bacterial pAgos from 
Aquifex cieolicus and Thermus thermophiles, site-specific 
DNA-guided endoribonuclease activity has been 
demonstrated in vitro (101,102), but the natural target 
and the source of the guide DNA molecule(s) remain to 
be determined. The pAgos could be classified into two 
large monophyletic groups: the 'long' form that contains 
a PAZ (ohgonucleotide binding) and PIWI (active or 
inactivated ribonuclease) domains and the 'short' form 
that lacks the PAZ domain (100). Almost all pAgos that 
lack a PAZ domain appear to be inactivated, and the 
genes encoding for these proteins are associated with a 
variety of predicted deoxyribonucleases in putative 
operons, including those from PD-(D/E)xK, Sir2 and 
phosphohpase D superfamilies. Furthermore, strong asso- 
ciation of the pAgo gene with defense islands has been 
demonstrated (100). Thus, it can be the hypothesized 
that the PAZ domain-containing pAgos directly destroy 
virus or plasmid transcripts via their endoribonuclease 
activity, whereas the apparently inactivated PAZ-lacking 
pAgos could be structural subunits of protein complexes 
that contain endonucleases targeting DNA. An alternative 
possibility is that pAgo represents a distinct ABI system 
(see later in the text) that targets host nucleic acid and 
causes death or dormancy of the infected cell. 
Regardless of the specific mechanisms, it is hkely that 
pAgos are key components of a novel defense system 
that uses guide DNA or RNA molecules to cleave target 
nucleic acids (100). 

SYSTEMS INDUCING PERSISTENCE AND 
PROGRAMMED CELL DEATH 

Toxins-antitoxins 

Both Type I and Type II TA systems originally have been 
characterized as 'addictive modules' that are encoded in 
plasmids and ensuring their persistence in a host lineage 
after a cell division (103,104). The toxin component of all 
TA systems is a protein that kills cells if expressed above a 
certain level, whereas the antitoxin component reversibly 
inactivates the toxin and/or regulates its expression, 
thereby preventing cell kilhng. Unlike the toxin, the 
antitoxin is metabohcally unstable so that, unless 
the antitoxin is continuously expressed, the free toxin 
can be accumulated in amounts sufficient to kill a cell 
(25,105-108). Once the first genomes have been sequenced, 
it became clear that numerous TA systems are present not 
only on plasmids but also on the chromosomes of bacteria 
and archaea (25,107). 

This surprising discovery stimulated a debate on the 
functions of the chromosomal TA systems and 
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prompted a series of comparative genomic and experimen- 
tal studies that resulted in the discovery of dozens of new 
TA systems. These findings and the current ideas on the 
biological roles of TA systems are summarized in several 
recent reviews (19,26,109-111). Briefly, it appears that the 
TA systems provide a mechanism for cell persistence to 
cope with various stress conditions (23,24,111). The 
majority of Type II toxins target different components 
of translation systems, especially mRNA (112,113), 
whereas Type I toxins affect membrane integrity (114). 
However, other targets of toxins have been identified as 
well, such as DNA gyrase (115) and the cell division 
GTPase FtsZ (116). Because Type I toxins have never 
been implicated in virus resistance and are not frequently 
observed in defense islands, we do not consider them here. 
Instead, we focus on Type II TA systems, particularly 
poorly characterized variants (Supplementary Table S3), 
and discuss the results of the recent efforts to identify new 
TA families using in silico approaches. 

The computational approaches for prediction of new 
TA systems can be classified into three groups: (i) 'guilt 
by association' when a new toxin or antitoxin is predicted 
by virtue of hnkage, in bacterial and archaeal genomes, to 
genes that belong to known antitoxin or toxin famihes 
(27,1 17); (ii) identification of gene pairs with characteristic 
features of TA systems such as tight linkage of genes 
encoding small proteins, propensity for HGT and 
presence on plasmids or within genomic islands with 
other defense genes (5,27); and (iii) statistical analysis of 
whole genome sequencing clones aimed at identification of 
genes that are unclonable (toxic) in E. coli (118). 

The new predicted TA systems usually are vahdated 
experimentally in E. coli by a kill/rescue assay in which 
overexpression of a toxin is expected to inhibit cell growth 
or kill the cell, whereas co-expression of the toxin and the 
antitoxin restores growth (117). However, the recent com- 
prehensive study revealed numerous genes that appear to 
be unclonable in E. coli but do not meet the definition of 
TA systems, including many metabolic enzymes and infor- 
mational genes such as ribosomal proteins (118,119). 
Although not all of these genes form two-gene operons 
that are typical of TA systems, these findings indicate 
that dosage imbalance or toxicity of an intermediate sub- 
strate can result in toxicity of a gene that can be mitigated 
by a proper regulation or co-expression by enzyme using a 
toxic product, mimicking the TA behaviour. Thus, predic- 
tion of new TA systems from experimental results 
obtained with this approach requires caution and should 
involve assessment of the known and predicted functions 
and operonic organization of the candidate genes. Several 
experimentally validated TA systems (e.g. GinA and 
GinC) do not form evolutionarily conserved two gene 
operons, suggesting modes of actions distinct from the 
typical toxin-antitoxin mechanism (120). For example, 
GinA, a close homologue of the phage Mu host-nuclease 
inhibitor protein Gam, which inhibits RecBCD binding to 
dsDNA ends (121), and its 'antitoxin' Sak, a single-strand 
annealing protein (122), are often linked to other enzymes 
involved in recombination and repair (120). Accordingly, 
it appears most Hkely that GinA and GinC are involved in 
repair-related functions as well. These compfications 



associated with the interpretation of the guilt by associ- 
ation predictions and the standard vahdation experiments 
indicate that additional experimental approaches are 
required to determine whether some recently identified 
systems are bona fide TA systems. 

Additional examples of poorly characterized (predicted) 
TA systems are given in Supplementary Table S3. One of 
the most abundant of the predicted TA systems, that is 
particular common in hyperthermophilic archaea, consists 
of a HEPN domain-containing protein the minimal 
nucleotidyltransferase (MNT). Among the two compo- 
nents of this TA system, the HEPN domain protein is 
hkely the toxin (118) that is predicted to function as a 
RNAse probably targeting an RNA during translation 
(54,55), whereas the MNT is the antitoxin. Although the 
HEPN-MNT module shares all the typical characteristics 
of TA systems (27), the molecular mechanism of this 
system, and in particular the role of the nucleotidyl- 
transferase activity of the antitoxin, remains unclear. 
The HEPN proteins in these systems belong to two 
groups, one of which is over-represented in thermophiles 
and the other one in mesophiles (27). The HEPN and 
MNT domains are often fused to each other, which is 
not typical of other TA pairs. Furthermore, the paRepl/ 
paRep8 {Pyrohaculum aerophilum repetitive family) family 
of HEPN domains, which is represented almost exclu- 
sively in thermophiles and is specifically expanded in 
crenarchaea, is not associated with MNT; therefore, it 
remains to be determined whether these proteins are 
toxins of a distinct family of TA systems using a still 
unidentified antitoxin. 

Another two component system in which one of the 
proteins is a predicted nucleotidyltransferase is 
DUF1814-COG5340. More than 700 occurrences of this 
system were detected in 430 sequenced genomes of most 
major lineages of archaea and bacteria including several 
Mycoplasma species with small genomes. Homology of 
the DUF1814 family with the ABI AbiG (123) and AbiE 
families (124) has been demonstrated (5). In this case, 
however, the nucleotidyltransferase (DUF1814) appears 
to function as the toxin, whereas the COG5340 protein 
that contains a predicted HTH domain is the antitoxin 
[(5), see also Supplementary Table S4]. Both ABI systems 
appear to act at the stage of phage DNA replication, but 
their molecular mechanisms remain unknown (22). 

Yet another putative new toxin is COG2856, a 
metzincin superfamily protease associated with a potential 
antitoxin, a HTH-domain protein of the Xre family, often 
fused to the protease (125). These putative operons are 
abundant in bacterial and archaeal genomes, phages and 
plasmids, with lineage-specific expansions in several 
bacteria. Interestingly, in the bacterium Deinococcus 
radiodurans, a COG2856 gene (irrE) is a major radiation 
resistance determinant (126). 

Comprehensive comparative genomic analysis of the 
distribution and co-occurrence of known and predicted 
families of toxins and antitoxins leads to the following 
principal conclusions: 

• The abundance of TA systems in the genomes scales 
superhnearly with the genome size (5,27). 
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• So far, no TA systems have been detected in 
most endosymbionts and, among archaea, in 
Thermoplasmatales, several methanotrophs with small 
genomes, and the only known symbiotic archaeon, 
Nanoarchaeum equitans (27,117,127,128). 

• The distribution of TA systems across phyla is dis- 
tinctly non-uniform, with many systems significantly 
over- and under-represented in various taxa (27,117). 

• Genomic occurrence of TA systems shows exceptional 
variabihty even in closely related genomes (27,117). 

• TA systems are prone to HGT and can be considered 
a part of the prokaryotic mobilome (27). 

• The network of associations between different families 
of toxins and antitoxins contains a giant connected 
component and only a few isolated systems 
(Figure 5). The existence of such a strongly connected 
network is due to the modularity of the TA systems 
whereby toxins and antitoxins typically can have more 
than one partner. The principal hubs of the TA 
systems network are the PIN and RelE toxins and 
the RHH and Xre antitoxins (Figure 5) (27). 

• The high prevalence of stand-alone toxin and antitoxin 
genes (>50% of the genes in the largest families do not 
belong to TA pairs) suggests potential in trans inter- 
action between toxins and antitoxins that remain to be 
discovered experimentally (27,117,128). 

Taken together, all these findings indicate that the TA 
systems comprise an extremely complex, versatile and cer- 
tainly not fully investigated network of 'semi-selfish' 
mobile elements that permeates the prokaryotic world. 
The principal role of the TA systems in bacteria and 
archaea appears to be induction of dormancy or 
programmed cell death in response to stress, in particular 
virus infection. However, it is currently impossible to rule 
out that the TA systems perform additional cellular 
functions. 

ABI (phage exclusion) systems 

The ABI (phage exclusion) systems represent another 
widespread group of defense mechanisms that abrogate 
virus infection at different stages, often by causing death 
of infected cell (21,22). Furthermore, some of the ABI 
systems are two-component modules with all the 
properties of TA systems (e.g. the Type III TA systems 
aforementioned). Numerous ABI systems were identified 
mostly by genetic methods in lactic acid bacteria and 
E. coli, but only for a few of them the molecular mechan- 
ism is known (21). Supplementary Table S4 briefly sum- 
marizes the available information on these systems 
together with the results of computational analysis that 
could aid further experimental study. These findings 
indicate extensive domain sharing between ABI and TA 
systems and support the observation that most of the 
systems of both classes act by inducing ceU death or 
dormancy. For example, the two-component AbiG 
system aforementioned is predicted to function as a TA 
system (5). Many ABI proteins or domains superfamily 
including AbiD, AbiF, AbiJ, AbiU2, AbiV and the 
C-terminal domain of AbiA belong to the HEPN 



endoribonuclease and are predicted to target the transla- 
tion system (54). A HEPN domain is also predicted to be 
responsible for the anticodon tRNase activity of PrrC and 
RloC [(54), Figure 6]. Abil, a predicted ribonuclease H 
superfamily nuclease, has a similar potential. Several 
membrane ABI systems often cause the membrane 
leakage similarly to Type I TA systems (129,130). 
Several ABI systems including AbiUl, AbiL and AbiR 
are often associated and might interact with R-M 
systems (5,131). Finally, there is a strong hnk with 
mobile elements through the reverse transcriptase 
domain of AbiA and AbiK proteins (132), although, 
unhke typical reverse transcriptase, AbiK catalyses 
non-templated synthesis of random sequence DNA that 
remains covalently attached to the protein and contributes 
to ABI (133). 

The ~30 currently known ABI systems come from only 
two model organisms, suggesting that they represent only 
a minor fraction of the total diversity of this type of 
defense modules in bacteria and archaea. Indeed, the 
analysis of selected defense islands reveals numerous 
uncharacterized gene families that could be candidates 
for ABI-hke defense systems (5). 



IMMUNITY-DORMANCY/SUICIDE COUPLING 
HYPOTHESIS 

As aforementioned, at the deepest level, all archaeal and 
bacterial defense systems can be classified into two major 
groups that function on two contrasting principles: (i) 
immune systems that discriminate self DNA from 
non-self DNA and specifically destroy the foreign, in par- 
ticular viral, genomes, whereas the host genome is pro- 
tected and (ii) systems that induce dormancy or 
programmed cefl suicide in response to infection. Most 
of the genomic loci that encode immunity systems such 
as CRISPR-Cas, R-M, DND or Pgl also encompass 
genes that encode toxins, in particular nucleases 
implicated in the induction of dormancy or cell death 
(Figure 6). The most common among these immunity- 
associated toxins are HEPN domain-containing (pre- 
dicted) nucleases (Figure 6). In contrast, the immunity 
loci do not seem to encode antitoxins, at least not those 
from well-characterized antitoxin famihes. So far, there is 
no indication that the toxins are mechanistically involved 
in the immune functions. Hence, the immunity-dormancy/ 
suicide couphng hypothesis, which posits that antivirus 
response in prokaryotes involves decision-making steps 
at which the cell chooses the path to follow by sensing 
the course of virus infection (55). 

According to the couphng hypothesis, the toxins 
associated with immune systems induce dormancy or cell 
suicide unless controlled by components of the respective 
immunity system that act as antitoxins. This type of 
couphng is illustrated by the activity of the E. coli 
anticodon nuclease PrrC that interacts with the PrrI 
R-M system. The coupling of diverse immunity and 
dormancy/suicide systems in prokaryotes could have 
evolved under selective pressure to provide robustness to 
the antivirus response. It can be further proposed that the 
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Figure 5. A network graph of the relationships between different famiUes of toxins and antitoxins. Known and predicted (magenta) toxins (red 
circles) and antitoxins (blue circles) and their operon organizations. The edges connect genes with five or more two-component operons identified; the 
thickness of an edge is proportional to the abundance of the respective operon. 



involvement of dormancy/suicide systems in the coupled 
antivirus response could take two distinct forms: (i) induc- 
tion of a dormancy-like state in the infected cell to 'buy 
time' for the activation of adaptive immunity and (ii) 
dormancy or suicide as the final recourse to prevent 
viral spread triggered by the failure of immunity. 

The first route is hkely to realize in the activity of Cas2, 
a protein that is present in aU CRISPR-Cas systems, es- 
sential for adaptive immunity and homologous to toxin 
interferases. Conceivably, this mechanism switches on 
when the CRISPR-Cas system encounters a new virus so 
that Casl protein has to detect and insert a new spacer. 
The dormancy-like response through the action of Cas2 
and/or a COG1517 protein containing an effector domain, 
of which the most common are the HEPN and the PD-(D/ 
E)xK (RecB-hke) family nuclease, would prevent virus re- 
production allowing the host the time required to prime 
the immunity response, which could be a relatively slow 
and ineffective process. The same reasoning could apply to 
other self-non-self discrimination systems if their action is 
slower than the action of viral phage counter-defenses 
blocking the immunity response. The second coupling 
mode is more straightforward. When an immunity 
system fails and/or the level of genotoxic stress increases, 
the ceU uses the associated toxins for abrogation of key 



cell processes, typically translation, resulting in persistence 
or cell death. The cell suicide in such a case can be con- 
sidered altruistic, i.e. preventing infection of other bacteria 
or archaea within the same colony or community. 

Although multiple associations of (predicted) toxins 
with prokaryotic immune systems have been observed 
(Figure 6), it seems likely that many more members of 
known toxin famihes as well as novel toxins remain to 
be identified within immune system loci. Indeed, many 
of the toxins are highly diverged, small proteins and 
could be easily overlooked, especially when they are 
fused to larger proteins as distinct domains (5,27). 
Finally, in trans interactions between immunity systems 
and TA modules cannot be ruled out. 

The couphng hypothesis might apply not only to anti- 
virus defense systems but more generally to any stress 
response systems, mimicking the hypothetical functions 
ascribed to TA systems. For example, recently described 
bactericidal system (134), polymorphic virulence systems 
(58) and Ter-dependent chemical stress response system 
(135) are linked with various nucleases that are hkely to 
possess toxin properties. Finally, it cannot be ruled out 
that some of the genes associated with immune systems 
perform functions different from the induction of 
dormancy or programmed cell death, such as repair of 
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Figure 6. Examples of genomic loci encoding different immunity systems and containing HEPN and PD-(D/E)xK domains. The genes are depicted 
as colored block arrows. The HEPN domain is shown by a light green shape with a red outline. The PD-(D/E)xK (RecB-like) domain is shown by a 
yellow shape with a red outline. HEPN, higher eukaryotes and prokaryotes nucleotide-binding domain, predicted endoribonuclease (54); Sir2, ParB 
and PD-(D/E)xK, DEDD are nucleases from distinct superfamilies. CRISPR-Cas gene names follow the nomenclature and classification from (18); 
R-M names follow the nomenclature and classification from (38). (A) HEPN domain associations. (B) PD-(D/E)xK domain associations. 
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the DNA, RNA or even protein damage that is incurred 
during the action of the immunity systems. 

This immunity-dormancy/suicide coupHng hypothesis 
impHes many experimentally testable predictions. In par- 
ticular, it can be predicted that Cas2 protein present in all 
CRISPR-Cas operons is an mRNA-cleaving nuclease 
(interferase) that is activated at an early stage of virus 
infection to enable incorporation of virus-specific spacers 
into the CRISPR locus or to trigger cell suicide when the 
immune function of CRISPR-Cas systems fails. Similarly, 
toxin-hke activity is predicted for components of 
numerous other defense loci. 



CONCLUDING REMARKS 

Defense mechanisms in bacteria, in particular R-M 
systems and TA systems, have been known for decades. 
However, recent comparative genomic analysis followed 
by experimental testing of the predictions has vastly 
expanded the scope of defense systems in prokaryotes. 
This expansion is both quantitative, including the discov- 
ery of diverse R-M and TA systems, and quahtative when 
fundamentally new defense mechanisms are discovered as 
was the case with the DND, Pgl and especially CRISPR- 
Cas. Given that genes encoding components of defense 
systems often evolve fast, that many of these genes 
encode small proteins and that the available genomes 
only represent a small fraction of the actual bacterial 
and archaeal diversity, there is little doubt that 
numerous defense systems, probably more than already 
known, remain to be discovered. Moreover, some of 
these findings have the potential to reveal new classes of 
defense mechanisms as suggested, for example, by the pre- 
diction of the pAgo-centred defense system(s) that remain 
to be experimentally characterized. 

The prevalence of different defense systems in bacterial 
and archaeal taxa shows pronounced trends, with four 
large groups of organisms being readily distinguishable 
with respect to the overall abundance of defense systems 
and the prevalence of specific types of defense. Although 
understanding of some of these trends, such as the over- 
representation of CRISPR-Cas in hypertherniophiles, is 
starting to develop, the biological relevance of most 
aspects of the phyletic distribution of defense systems 
remains to be discovered. 

Statistical analysis of the localization of genes encoding 
defense system components in bacterial and archaeal 
genomes shows highly significant clustering in defense 
islands. Although the evolution of defense islands 
remains to be investigated in details, in general, they 
seem to emerge through a preferential attachment mech- 
anism in genome regions characterized by high rate of 
recombination and relaxed selection for the maintenance 
of local synteny. Although in itself the formation of 
defense islands is hkely to be a non-adaptive, essentially 
neutral process, the islands become a 'playground' for 
rapid evolution and shuffling of genes and domains of 
the defense systems. Furthermore, defense islands, in 
addition to known defense systems, contain numerous 



uncharacterized genes that can be considered candidates 
for the discovery of new defense mechanisms. 

The tight genomic association of immunity systems and 
the defense systems that induce dormancy or cell death 
suggests that these two major types of defense systems 
are often functionally coupled. Such coupling could 
manifest in cell death being triggered when the primary 
immunity mechanism fails or in the persistence state being 
forced potentially providing conditions for more effective 
and less damaging action of the immune systems. Which 
of these mechanisms is realized under what conditions and 
how do the defense decisions depend on various factors 
remains to be studied. All the immune systems that act on 
the self-non-self discrimination principle possess at least 
one component (such as RE) that can act as a toxin so that 
the entire system causes cell death or persistence instead of 
immunity. One example of such conversion, where a R-M 
system becomes a TA system, has been experimentally 
studied (136). 

The versatihty of the defense systems is to a large extent 
supported by the combinatorial shuffling of their constitu- 
ents. The prime case in point is the two-component TA 
systems that form a strongly connected network owing to 
the fact that the same toxin family typically combines with 
more than one antitoxin family and vice versa. 
Furthermore, the distinction between TA and ABI 
systems is starting to fade away. A more appropriate 
view of these systems should focus on toxins that are 
activated or inactivated by numerous different signals 
encoded either in cis or in trans. Thus, substantial revisions 
of the definitions and classification of these defense 
systems appear inevitable. 

Although the approaches for comparative genomic pre- 
diction and further experimental analysis of bacterial and 
archaeal defense systems have substantially advanced 
during the past few years, the study of viral 
counter-defense mechanisms is in its embryonic stage, 
despite the extensive experimental evidence that such 
systems are numerous and could either be generic or spe- 
cificafly target distinct host defense systems. For example, 
RNA Ugase encoded by phage T4 can repair tRNAs 
cleaved by PrrC ACNase in E. coli (51), the Dmd 
protein of bacteriophage T4 functions as an antitoxin 
against E.coli LsoA and RnlA (137,138) and a short 
RNA gene from bacteriophage PhiTE functions as 
antitoxin to ToxIN system (139). 

The recent advances in the study of bacterial and 
archaeal defense systems are uncovering the remarkable 
complexity of prokaryotic evolution that is in large part 
shaped by the virus-host arms race. Moreover, the newly 
discovered defense systems might eventually lead to 
breakthroughs in biotechnology that could be comparable 
with that brought about by the discovery of the R-M 
systems. 
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